Mon. Not. R. Astron. Soc. 000, 000-000 (2008) 



Microlensing, structure of the galactic halo and 
determination of dark objects' mass function 



D. Markovic and J. Sommer-Larsen 

Theoretical Astrophysics Center, Juliane Maries Vej 30, DK-2100 Copenhagen 0, Denmark 



12 December 1996 



ABSTRACT 

We study the accuracy and systematic error of inference of massive halo objects' (MHO 
or 'Macho') mass function from microlensing events observed in the direction of Large 
Magellanic Cloud. Assuming the spatial distribution and kinematics of the objects are 
known, the slope and the range of the MHO mass function (modeled here by a simple 
power law) will be possible to determine from 100-1000 detected events if the slope 
is in the range — 2.5<a< — 0.5, with the statistical errors reaching their minima at 
a — —1.5. Outside this range the errors grow rapidly making the inference difficult even 
at very large numbers of events {N sa 10000). On the other hand, the average mass of the 
MHO's will be determined to better than about 30% accuracy from N « 100 events for 
any slope. Overall, we find that the accuracy of inference at fixed N will not be strongly 
affected by the presently available event duration-dependent detection efficiencies if the 
typical MHO masses are in the range (order of magnitude O.IMq) indicated by the 
events detected so far. 

We also estimate the effects of the uncertainty of the massive objects' spatial distribu- 
tion and kinematics on the determination of their mass function. The massive objects' 
halo models considered are all spherical but we allow for various density profiles and 
a radius-dependent, anisotropic velocity dispersion. We find that while the mass func- 
tion slope and range (i.e., the 'shape') are weakly affected for —2 < a < 0, the error in 
the average mass due to the halo structure uncertainties could be reduced to less than 
about 50% only through the detection of about 1000 or more events. Reliable estimates 
of the halo structure itself [density profile and (anisotropic) velocity dispersion profile] 
can start only at very large numbers of detections {N > 10000). 

Key words: Microlensing - Galactic halo ~ Macho mass function. 



1 INTRODUCTION AND OVERVIEW 



The admittedly large uncertainty stems in part from the still 
small number of events and thus a poor statistics, but also 



A deca de after Paczynski's ground-breaking proposal (1986) from our lack of knowledge [as made clear in (Alcock et al. 



several teams and considerable resources around the world 



are now devoted to searches for microlensing events [see re- 
cent overviews by Paczynski (1996) and Roulet & MoUerah 
(1996)]. These searches have been rewarded by by a large 
number (~ 100) of detections in the direction of the galactic 
bulge and already a si gnificant set of event s observed [1 or 
2 by the EROS team (|Aubourg et al. 1993[ ) and 6-8 by the 
MACHO team ( |Alcock et al. 1996D ] along the line of sight 
to the Large Magellanic Cloud. 

In the latter case microlensing is expected to provide 
us with direct information on th e composition of th e galac- 
tic halo. A statistical analysis ( Alcock et al. 1996| ) of the 
presently available data suggests that dark massive objects 
— the potential microlenses — could account for between 
30% and 100% of the total mass in the halo. In addition, 
their most likely masses should be in the range O.I-O.6M0. 



1996)] regarding the spatial distribution and kinematics of 



the massive halo objects (MHO). The import of the halo 
structure is due to the simple fact that the (inferred) mass 
of a lens scales as m ~ (r«n)^/DL(-Ds --Dl), where T is the 
observed duration of the microlensing event, Vn is the MHO's 
velocity orthogonal to the line of sight, Dl is the earth-lens 
distance and Ds is the earth-source distance. At the same 
time, th e integral even t rate which is roughly proportional 
to J py^ Dh{Ds ~ Dh)vndDL obviously incorporates details 
about halo beyond the mere total mass Mtot ~ / pd^DL. 
Therefore, the conversion from the observed quantities, the 
event durations and the rate, to information on MHOs' 
masses and their halo's total mass inevitably involves as- 
sumptions about the structure of the halo. 

A question arises at this point: should we expect — 
as might seem only natural — that, given a larger num- 
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ber of detections in a foreseeable future, the determination 
of the MHO mass function and the MHOs' fraction of the 
total halo mass could reach a significantly higher accuracy 
thus providing us with better clues regarding the origin and 
evolution of the halo. The present paper will address this 
question; 



been discussed by, e.g., Giudice, Mollerach & Roulet (1994) 
and De Rujula et al. (1995).] Along with the r~'^'^ density 
profile, the radius-dependent velocity dispersion anisotropy 
(see Section 2) adopted for our CS model describes rather ac- 
curately a well known stellar halo p opulation, the blue hor- 
izontal branch field stars, BHBFS ( jommer-Larsen, Flynn 



Re ;ently Mao and Paczynski (1996) studied the issue of fc Christensen 1994 ) . The CS model reflects the possibility 



MHOs' mass determination. By considering simplified 'toy' 
models (e.g., uniform spatial density of MHOs) they were 
able to estimate that a reliable determination of the mass 
function could be achieved if we had 100 or more events. 
They modeled the mass function by a simple power law 
dn/dm oc m" (as we shall do in this paper) and concluded 
that for a ^ —1.5 {a 3> —1.5) the high (low) mass end of the 
mass function will be difficult to probe. As they pointed out, 
their results depended on the assumption that the MHOs' 
spatial distribution and kinematics were known. 

In this paper we also at first assume that the halo model 
is known. The specific model used in Section 4 for the pur- 
pose of mass function inference is the isothermal sphere with 
a core (CIS). Although the underlying halo model is consid- 
erably more realistic in our case, the basic conclusion of Mao 
and Paczynski remains valid: in the vicinity of a = —1.5 
the slope and range (3 — logj^Q(mmax/?Timin) can be deter- 
mined with A*' ^ 100 events. However, as we shift away from 
a = —1.5, the error of determination grows very rapidly. 
In particular, for a positive slope a one needs A'^ > 1000 
events for a reliable inference. [It is to be hoped — per- 
haps not unrealistically — that the actual mass function of 
the MHOs will indeed correspond to a sufficiently close to 
-1.5.] A quantity that can be accurately {<,30% error) in- 
ferred with N > 100 at any slope is the average mass of the 
MHOs, i.e., the first moment of the mass function. We find 
that our results do not depend dramatically (see, e.g.. Fig. 
6) on whether the detection efficiency is flat (i.e. indepen- 
dent of event duration) or of the type presentl y available for 
the MACHO project's microlensing searches (Alcock et al. 
199d)~" 



The effects of halo structure uncertainty will receive our 
attention in the later part (section 5) of the paper. At the 
outset of section 5 we will perform the following 'experi- 
ment:' we will assume that a specific spherical halo model 
describes the actual halo accurately enough (this model will 
be called the 'real' one) . We will then ask what the effect on 
meiss function and MHOs' halo fraction determination is if, 
instead of the 'real' model, we use for the inference a differ- 
ent one, the isothermal sphere (chosen here in its singular, 
p ~ r~^, version). 

For the 'real' model we take a 'concentrated' sphere 
(CS) with a steeper density profile, p ~ r~^ *, and an 
anisotropic velocity dispersion. This density profile (for 
r ^ Rq , the radius of the Sun's orbit) is commonly asso- 
ciated with the stellar halo (or 'spheroid') that consists of 
old, metal-poor stars. While the local density of th e lu- 
minous halo is observed to b e low ~ lO~^M0/pc^ ( Bah- 
call, Sc fimidt fc Soneira 1983 ), it is possible that the stel- 
lar halo has a signifficantly more massive, dark (though 
plausibly baryonic) counterpart of a similar structure. [A 
massive, dark 'spheroid' of local density ~ W~''^ Mq/pc^ 
was proposed in the past as a dynamically rele v ant com- 
ponent of the galax y ( ICaldweU fc Ostriker 1981 ; |Rohlfs &] 
Kreitscimann 1986). Microlensing by 'spheroid' objects has 



that, just like the spatial distribution of the BHBFS, the 
massive objects' distribution may differ from that of the to- 
tal (luminous -I- baryonic + nonbaryonic dark matter) halo 
mass. 

As a result of our 'experiment' we find that, although 
Q and /3 are weakly affected by the halo model ambiguity, 
the inferred p is on the average about 60% greater than 
the real value. It is important that unless N <; 1000, the 
statistics based only on event durations does not allow us to 
distinguish between the two massive objects' halo structures; 
the differences between them are submerged in the statistical 
noise. 

These results indicate that the uncertainty of the in- 
ferred average mass will be difficult to reduce below about 
the factor of 2. A similar conclusion, with similar magni- 
tudes of relative errors, holds for the massive object halo's 
local density po and the total halo mass between the so- 
lar orbit and LMC, Mtot- The unresolvable (at N < 1000) 
ambiguity is characteristic of a broad range of halo models 
that one might choose as the 'real' ones instead of CS, al- 
though the ensuing uncertainty of the inferred average dark 
objects' mass and their total halo mass may be smaller than 
that obtained in the specific case of the CS/SIS ambiguity. 
[Since the inferred halo density in the vicinity of the Sun 
for the CS model is about twice the value for the isother- 
mal sphere, it may be possible to rule out the more con- 
centrated (e.g. p ~ r~^ * at t^Rq) halo models on the 
basis of dynamical arguments even at A < 100 if the MHO 
halo indeed turns out to be very massiv e, i.e., approaching 



the total halo m ass. For more detail see (Sommer-Larsen & 



Markovic 1997).] These results are supported by our max- 



imum likelihood simulations presented in Section 5, where 
the (appropriately parametrised) halo model is treated as 
unknown and its parameters are varied together with the 
mass function parameters: only at A ^ 1000 events do the 
errors in p, po and Mtot fall below 50%. 

Given the current detection rate of a few events per 
year, and even hoping that it could be increased in the fu- 
ture, the very large number of events needed does not sup- 
port optimism regarding an accurate inference of MHOs' 
masses and their halo fraction on the basis of LMC events 
only. Probing the halo at various angles relative to the cen- 
ter of galaxy, e.g. through the observation of events toward 
M31, could help distinguish between different halo models. A 
considerable improvement might come from gaining more in- 
formation from individual events by, e.g., parallax measure- 
ments (Refsdal 1966; Gould 1992, 1994, 1995b) that could 
put tight constraints on MHOs' spatial distribution and/or 
kinematics. 

This paper is organised as follows. In Section 2 we re- 
view the halo models that will be used in the subsequent 
sections. Section 3 outlines the derivation of microlensing 
rates. In Section 4 we discuss the MHOs' mass function de- 
termination errors assuming a known halo model while in 
section 5 we study the effects of halo model uncertainties. 
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Finally, the Appendix provides some statements and formu- 
lae of mathematical statistics (along with the derivations) 
that are extensively used in the main body of the paper. 



2 MODELS OF MHO DISTRIBUTION AND 
KINEMATICS 

In this paper we will consider a range of models of the mas- 
sive halo objects' distribution and velocities. For simplicity, 
we will restrict our discussion to spherically symmetric ha- 
los. One of the most commonly used models is the isothermal 
sphere with the velocity dispersion constant throughout the 
halo the density profile which is well approximated by 



p{r) 



Po- 



(1) 



where a « 5 kpc is the 'core' radius and Rq = 8.5 kpc is 
the distance of the Sun from the galactic centre. Assum- 
ing that the total (luminous -I- dark matter) halo density 
is distributed according to expression (|l]), one obtains the 
observed (approximately) flat rotation curve for the galaxy. 

The HMO mass distribution, however, need not follow 
that of the total halo mass. One possibility is that the mas- 
sive objects may be more concentrated toward the galactic 
centre. We may thus follow the clues provided by recent ob- 
servations (Sommer-Larsen, Flynn & Christensen 1994) of 



the blue horizontal branch field stars (BHBFS) in the outer 
halo. These observations imply that the velocity dispersion 
changes from (3 = 1 — ag/a^ > at smaller distances from 
the centre of the Galaxy to /3 < at larger distances. The 
radial velocity dispersion is well described by the analytic 
fit 



■ tan 



(2) 



where the best agreement with the observations is achieved 
with (Jo — 80kms~^, a+ = 145kms~^, ro = 10.5kpc 
and I — 5.5 kpc. The BHBFS halo is close to spherical 
with the density that is well modeled by the power law 
p — po{RQ/ry' , where 7 « 3.4. 



From the Jea ns' equation for spherical systems ( Binney 
fc Tremaine 1987| ) 



1 d{pG'i) 2/3(7 



dr 



+ ■ 



dr ' 



(3) 



where <l>(r) is the gravitational potential, we obtain the tan- 
gential velocity dispersion 

r da^ 
^ 2~d^' 



2 



2 " 



- 1 



(4) 



where Vc = {—rd^/dr)^^^ is the (roughly constant) rotation 
velocity. Notice that given both the negative slope of the 
radial velocity dispersion 



da^ 
dr 



1 r 



TT / 1 + [(r - ro)//]2 ' 

and 7 > 2, the tangential velocity dispersion will be smaller 
than in the case of an isothermal sphere {a = 2, (Jr = const, 
at large radii) with the same Vc [see Fig. 1]. This increased 
'pressure support' is merely a consequence of the collisionless 
Boltzmann equation and — in the final instance — the con- 
servation of the phase-space volume (Liouville's theorem). 
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Figure 1. Anisotropy parameter (3 and velocity dispersion for 
halo model CS as functions of the distance d from the Earth in the 
direction of LMC; is given by the solid line, at by the dotted 
line and <Tx [see paragraph preceding equation ( |ll[ )] by the dashed 
line. The straight solid lines correspond to the singular isothermal 
sphere, SIS {f3 = 0, a = 156 km/s) . 



It is realistic enough — and will prove quite convenient 
in the following — to model the velocity distribution by the 
Gaussian 



f{Vr,Ve,V4,) = 



1 



1 



(27r)3/2 ara- 



■ exp 



2 I 2 



(6) 

where and at are given by equations (^) and (^) for power- 
law density profiles. 

In this paper we will adopt the following nomencla- 
ture for models of the massive objects' halo: for the power- 
law halo with 7 = 2 and an isotropic velocity dispersion 
we will use the familiar name, singular isothermal sphere 
('SIS'); if the density profile has a core (|l]) and the velocities 
are still distributed according to an isotropic version of (^ 
with constant — <Jt = Vc/V2, we will use the shorthand 
'CIS' (strictly, this model does not satisfy Jeans' equation); 
the massive objects' halo model corresponding to that of 
BHBFS with the power-law density profile 7 = 3.4 and the 
dispersion given by (^ and Jeans' equation will be called 
the 'concentrated sphere' ('CS'). 
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Figure 2. Coordinates and angles used in the derivation of the 
lensing events rate. 



3 MICROLENSING STATISTICS 

The magnification of an LMC star due to the crossing of a 
mas sive object near the line of sight from the earth is given 
by (Paczynski 1996) 



+4)1/2' 



where 



6' + vlt" 



1/2 



(7) 



(8) 



In the above equation b is the impact parameter of the mas- 
sive object relative to the line of sight, is object's velocity 
orthogonal to the line of sight and 



Dxil - x) 



1/2 



(9) 



is the Einstein radius, where D is the distance of the LMC 
from the earth, xD (0 < x < 1) is the object-earth distance, 
m — ^Mq is object's mass and te = 3.2 x 10® km. 

We define the duration of a microlensing event as 
T = RE/vn- This is a measurable quantity; it can be ob- 
tained as soon as one knows the maximum magnification 
^max = A{u — Umin = b / Re) and, say, the time inter- 
val bet ween half-magnificati ons A — Amax/2 [see, e.g., (De 
Riijula, Jetzer & Masso 1991)]. The maximum magnification 



and duration are the only two measurable quantities while 
they depend on four parameters: m,Vn,x,b. Any informa- 
tion on properties of the massive object halo can thus be 
obtained only through a statistical analysis of a sufficiently 
large number of events. 

The statistical analysis proceeds from the rate of mi- 
crolensing events based on plausible models of dark-mass 
halos, such as outlined in the last section. Since the differ- 
ential cross section per unit distance xD for a massive object 



to pass the line of sight between impact parameters Reu and 
Re{u + du) is 2REdu, the rate in the more general case is 
given by 



N, Ddx J 



7 dn „ 
dfi— 2Re 
dfi 



du„ 



fn{Vn)VndVn 



(10) 



where A'', is the number of simultaneously observed stars, 
dn/djj, is the differential number density of the massive ob- 
jects, /n is the probability distribution of velocities orthog- 
onal to the line of sight and Uth is the threshold for success- 
ful detection [the maximum amplification is required to be 
greater than the threshold value, j4niax(^tmin) > j4niax(uth)]. 

In equation ( p^ we ignore the motions of the earth and 
the source orthogonal to the line of sight. While enhanc- 
ing the total event rate by only a few percent in the case 
of sources in LMC, the observer's and so urce's transv ersal 
motions lead to typically shorter events (Griest 1991). Al- 
though a proper analysis of the results of actual microlensing 
searches would have to take this effect into account, we will 
assume in this paper that the oberver and the sources are 
stationary. This should suffice for an analysis that attempts 
to isolate the effects of the halo objects' mass function, spa- 
tial distribution and kinematics. A more complete treatment 
would not change significantly our main conclusions regard- 
ing the accuracy of the mass function inference; the consid- 
erably greater computational effort does not seem necessary 
at this point. 

Both fn{vn) and dn/dfi depend on the chosen model of 
the massive object halo. In order to describe the velocity 
distribution at a point at distance xD along the line of sight 
from the earth, we introduce a local coordinate system (see 
Fig. 2) so that the x axis is orthogonal to the line of sight 
and lies in the plane given by the earth, the center of the 
Galaxy and LMC, while the z axis points toward LMC. The 
distribution function (^ can now be rewritten in terms of 
the velocity components Vx, Vy and Vz, 



f(Vx,Vy 



(27r)3/2 cr,a2 



exp 



{cosS Vx + shiSvzY (sin 5 — cos 5 Dz 



(11) 



where sin 5 — (_RQ/r)sint and — Rq + (xD)'^ — 
2xDRqcosl (t = 82° for the LMC). The two-dimensional 
velocity distribution in the plane orthogonal to the line of 
sight is then 



f{Vx,Vy,Vz)dVz 



2-Ka^at 



exp 



_ ^ J 2. 



(12) 



where = cos'^ (5 Ut + sin^ 5 . It is now straightforward to 
obtain the distribution for velocity = \ cos </> + sin Vy \ 
orthogonal to the line of sight 



/n(Wn) = / /t (?;n COS sin (?!)) Wn rf<^ 
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27r(JxCrt 



-Vnio 



1 + 



(13) 

where we have used the identity for the Bessel function of the 



zero'th order 2nIo{x) = J^^ exp {x cos 0) d(l> ( Abramowitz & 



Stegun 1972). Of course, for an isotropic velocity dispersion 



one recovers the familiar Maxwell distribution. 

As we indicated in the last section, the massive object 
number density will be modeled either as the power law 



rio = noH{x), 



where 



H{x) 



l+x" 



D_ 



2x^— cos t 

Rq 



-7/2 



(14) 



(15) 



or as the modification of ( |l5| ) for 7 = 2 that includes a core 
(0). Throughout this paper we will assume that the mass 
function drio/dfi is independent of the position in the halo. 

Recalling that velocity is related to t he duratio n of a 
microlensing event by = Re/T = r^y/iixll ~ x)/T, we 
can change the variable of integration in (hoh from Vn to T 
and thus obtain the rate of event detection 

V^2N,Drlu,,, j dfi^ j dTe{T)F{ii/T^), (16) 
where 

FitJ^/T^) = [-^y j\x{l-x)f^H{x) 



X fu 



^X{l-X)fi/T^ 



dx. 
(17) 



In the last expression we have introduced the detection efh- 
ciency e(T), i.e. the fraction of all events of duration T and 
satisfying u < Uth that will be detected with the available 
techniques. In this paper we will use the following approxi- 
mate expression for the detection efficiency 



e(T) 



0.3 e-('"(^/^'"»^ ''/2-'5'*, 
0.3 e"!'"*^/^"*!'"''^'^'' 



T <Tm 



(18) 



where Tm = 75 days. This expression is in good numerical 
agreement with the efficiency quoted by the MACHO team 
(Alcock et al. 1996) for their two-year LMC microlensing 
detection results. 

As a model for the massive o bjects' mass function w e 
choose a simple power law [see e.g. (Mao & Paczynski 1996)] 



p^{fJ-)dfi = 



C/3(a) 



(19) 



for the probability that the mass of a star lies in the interval 
M0d/i. In the simplest possible case the mass function is 
specified, apart from the exponent a, by the range of masses 
P = logio(/imax/Mmin), whcrc /.imax aud fi^in are the upper 
and lower limits of the range, and by the geometric mean 



fio = (Mmax/^min)^''^. The normalisation constant in equation 
( p^ is then given by 



/?lnlO 



Q = -1, 



Cilia) = 



(20) 



while the massive objects' mass density near the Sun is 
related to their number density by po — nofJ,oMQC'f3{a + 
1)/Cp{a). 



4 DETERMINING MHO MASS FUNCTION: 
FIXED HALO MODEL 

Given a sufficient number of detected microlensing events, 
one can attempt to infer the mass function of the lensing 
massive objects. In this section we will estimate the accu- 
racy of such an inference. In different words, we will try to 
estimate just what is the 'sufficient' number of events for a 
reliable mass function determination. 

In this section we make the important assumption that 
the halo structure, given by MHO's density profile and kine- 
matics, is known (the consequences of the halo uncertainty 
will be discussed in the next section). We then simulate the 
inference of the mass function parameters fio, a and /3 from 
samples of a fixed number A'^ of microlensing events of dura- 
tions Ti, i = 1, A*'. We use the maximum likelihood method 
[for an alternative meth od of mass function inference, based 
on mass momenta see ( De Rujula, Jetzer fc Masso 1991 ; 



Jetzer & Masso 1994; Jetzer 1996)], where we maximise the 



function 

Z({r.}|^„, Q, /3) = ElLo lnP(r.|M„, Q, /?) 



(21) 



with r espect to the parameters /Xo, Q and /3 [see (Glould 



1995a) for a different yet equivalent formulation of the 
method]; we denote the values of parameters at the max- 
imum of I by /io, OL and /3. In equation ( pl| ) P(r|/io, a, /3) is 
the normalised (J P(T| • ■ •)dr — 1) differential event rate, 
P(r) oc dVjdT. 

Given the power-law model of the MHO mass function 
(po|), the rate V [see equation ( p^ ] can be rewritten as 

r = 2A*Dr|iithno / dT 



e{T)T 



2{q + 1) 



0^/2 



y"F{y)dy, (22) 



where y = fi/T^. 

In both panels of Fig. 3 we have assumed that the 
MHO distribution and kinematics are well described by the 
isothermal sphere with a core [CIS, see equations (^ and 
(^)] and that we use just this (the 'real') halo model for the 
MHO mass function inference. The underlying ('real') mass 
function in the left-hand panel is given by fio ~ 0.86, a — —2 
and P = 2, while that in the right-hand panel corresponds 
to Ho — 0.06, a = 1 and (3 = 2. (The average mass in both 
cases is p, = 0.4.) The left columns of each of the two panels 
show results for e{T) = const., while in the right-hand side 
columns the detection efficiency is assumed to be of the form 
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Figure 3. Dependence of variances ct^q, (Ta and a/j on the number of detected events for fio = 0.86, a = —2 and (3 = 2 (left panel) 
and Ho = 0.06, a = 1 and /3 = 2 (right panel). [In both cases the average mass fi = O.4.] In each panel the left column is based on a flat 
[e =const.) detection efficiency, while the right-hand column assumes the MACHO-type efficiency ([l8|). The solid circles connected by 
dotted lines represent the variances as obtained by Monte-Carlo simulations. The mean shift of the Monte-Carlo inferred values relative 
to the 'real' parameters (bias) is given by squares connected by short-dashed lines (positive bias) or triangles connected by long-dashed 
lines (absolute value of negative bias). The variances in the Cramer limit are given by solid lines. 



In the figure we compare results of Monte-Carlo simula- 
tions (root-mean-square variations from the mean values of 
the inferred parameters are given by solid circles connected 
by dotted lines) for N = 100, 1000 and 10000 with the er- 
ror estimates obtained in the so called Cramer limit (solid 
lines). The Cramer limit error f'^(Cfj) in the determination 
of parameter from A'' data points is given by 



(23) 



'th 



where the expression on the right-hand side is the 
component of the inve rse of the information matrix [see Ap 
pendix, equation (A9)] 



d 



lnP(T|c) 



d 



\nP{T\c] 



X P(r|c) dT, 



(24) 



where c denotes the three parameters fj,o, a and 13. 

For the negative slope (a = —2) the errors in a and 
P are rather small at A*' > 100 and, not surprisingly, the 
Cramer limit is in good agreement with the results of Monte- 
Carlo simulations. As for po, a respectable accuracy can be 
achieved only for N > 1000 if e is flat or for even a larger 
number of events if the detection efficiency is of the MACHO 
type. Notice, however, that convolution with a MACHO- 
type e{T) typically results in a relatively moderate error 
increase at a fixed number A'^ of detected events. Still, the 
parameter estimation would take considerably more obser- 
vation time due to the simple fact that this detection effi- 
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Figure 4. 'Icr' (68% confidence level) and '2cr' (95% confidence) 
contours obtained by Monte-Carlo simulations of a and /3 infer- 
ence for both a = —2 and a = 1. The third parameter, fio, has 
been 'integrated ' over. The panels show results for (left to right) 
100, 1000 and 10000 events. In all cases /3 = 2, ^ = 0.4 and 
e{T) = const. Notice the smaller size and more rapid shrinking 
with increasing N in the case of the contours centered on a = — 2, 
(3 = 2. At N = 1000 and especially at TV = 10000 the contours 
are tightly concentrated around the 'true' value a = —2. 



ciency allows us to detect only 1/4 to 1/3 of all microlensing 
events, given the mass range discussed in this paper. 

The errors are much larger in the positive slope {a = 1) 
case. The determination of the slope itself is rather inaccu- 
rate, (Jo, /I a I > 1, unless A*' approaches 10000 (see the Monte- 
Carlo results). Although only moderately large (cr^ « 1) at 
small numbers of events (A'' ~ 100), the uncertainty of /3 de- 
creases very slowly for larger A''. In addition, the statistical 
error in fio is hopelessly large even for the largest number of 
events considered in Fig. 3 (notice that /io = 0.06). 

The rather grim prospects for the inference of the a = 1 
mass function are further illustrated by the contour plots of 
Fig. 4. Closely related to the large errors, in about 30% of 
the Monte-Carlo simulations at 100 events, the maximum 
likelihood procedure results in a delta-function best fit for 
the MHO mass distribution. [This fraction falls to 2-3% at 
Af=1000, and then to 0% at A/'=10000.] This is indicated 
by either the inferred a tending to -l-oo or (much less fre- 
quently) P approaching 0. In other words, the statistical 
errors are so large as to have a good chance of concentrating 
events' duration scatter to what may look like an effect of 
a single-mass MHO population. These cases were excluded 
when the statistics of Monte-Carlo results was computed. 

The irregular shape and great extent of the contours ex- 
plains why Cramer limit is such a poor approximation in this 
case. Indeed, the Cramer limit depends only on the proper- 
ties (more specifically, the first derivatives with respect to 
the parameters c) of the distribution function in the imme- 
diate vicinity of the 'real' parameters. In the a — 1 case the 
Cramer limit gives large errors indicating that the function 
is very insensitive to small changes in the parameters. How- 
ever, once we shift away from the original parameters, the 
nonliner distorsions of the distribution function are probed 
and this may result in smaller deviations of the inferred pa- 
rameters then one would expect on the basis of the Cramer 
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Figure 5. Histograms of inferred average masses p obtained 
by Monte Carlo simulations for the 'real' parameters fio = 0.86, 
a = —2 and /3 = 2 (thus p, = 0.4) in the case of uniform sensitivity 
(left) and the MACHO-type sensitivity [equation (vW] (right) for 
N = 100 (top), 1000 (middle) and 10000 (bottorn) events. The 
numbers in the upper right corners indicate mean shifts (b) of the 
inferred parameters relative to the 'real' values, root-mean-square 
variances (ct) with respect to the means and the Cramer-limit 
errors {(j'^). The underlying halo model is the isothermal sphere 
with a core (CIS). 



limit estimates. [This is indeed the case for fio and /? at 
a — 1 (see again Fig. 3; the Cramer limit is given by the 
solid lines).] Since the distribution function P{T\c) is intrin- 
sically so little dependent on small changes in the parameters 
c, convolving with a MACHO-type detection efficiency e{T) 
will not change the estimation accuracy significantly, as can 
also be concluded from Fig. 3. 

The very large errors in the estimation of fio both for 
Q = — 2 and (especially) a = 1 prompt us to seek a com- 
bination of the three parameters po, ct and (3 that can be 
inferred with greater accuracy. Not surprisingly, the average 
mass fi (the first moment of the MHO mass distribution) 
is just such a quantity. In Fig. 5 we plot the histograms of 
inferred Jl for the negative slope (a = —2) case obtained 
by the same Monte-Carlo simulations (400 simulations for 
each set of parameters, as is sufficient to obtain a relatively 
smooth distribution of points in the ct -(3 plane) whose re- 
sults were shown in Figs. 3 and 4. We can see that even for 
A'^ — 100 events the average mass can be inferred with decent 
accuracy. Notice that the Cramer-limit errors are in good 
agreement with the Monte-Carlo results. The same conclu- 
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Figure 6. The dependence of the Cramer-hmit errors on a. 
The errors (normaUsed to 100 events) for a 'broad' range, j3 = 2, 
mass function are shown as the soUd [s{T) = const] and dot- 
ted [MACHO-type detection efficiency, e{T) T^const.] Hnes. For a 
'narrow' mass range, /3 = 1, the errors are shown as the dashed 
and dot-dashed hnes for the respective detection efhciences. 



sion — as well as similar values of the errors — holds for 
the positive slope a = 1. 

As we could see in Fig. 3, Cramer limit gives a poor es- 
timate of the actual maximum-likelihood errors when those 
errors are large. However, if in some regions of the parame- 
ter space we detect peculiarly large Cramer-limit errors, that 
implies (as we have seen above) a low sensitivity of the dis- 
tribution function P(r|c) to small shifts in the underlying 
parameters and correspondingly indicates larger maximum- 
likelihood errors. By contrast, if the Cramer-limit errors are 
small they can indeed be used as realistic estimates of the 
actual inference uncertainties. 

In Fig. 6 we show Cramer-limit errors, normalised to 100 
events as functions of the 'real' a. The 'real' /3 is chosen to be 
2 ('broad' MHO mass function) or 1. Apart from the larger 
errors for the MACHO-type e{T) compared to the flat e 
errors, another easily predictable feature is the much poorer 
accuracy of the slope a determination for a narrow-range 
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Figure 7. Changes in curves 

[TP(T)]l/2 resulting from small 
shifts in a (dotted) and /3 (dashed) for a = —3/2 (left) and 
/3 = —1/2 (right). The values of parameter shifts arc indicated in 
the figure. The reference curves (solid lines) are chosen to have 
/3 = 2. Notice that we had to choose larger magnitudes of param- 
eter shifts in order to produce discernible changes for a = —1/2. 
£(T) = const, is assumed. 



(/3 = 1) mass function in comparison with the broad-range 
function: the slope is sampled much better in the latter case. 

Somewhat less obvious is that the accuracy of shape 
(slope a and range /3) determination should peak as sharply 
at a = —1.5 as we observe in Fig. 6. Indeed, one would ex- 
pect large uncertainties in the slope and range determination 
if the slope a is of large magnitude — positive or negative — 
thus bringing the mass distribution close to the delta func- 
tion limit. The fact that the transition region between the 
two extremes is centered on a — —1.5 can be understood as 
follows. 

For large values of /3 expression TP{T) as a function of 
of InT has a broad plateau [d{TP{T}) /dlnT « 0] for a = 
—3/2 as can be seen from equation (P2|): since F{y) ~ for 
y < (cr/rE)^/A and F{y) ~ 1/y for y > {a / r^)'^ / Jj, (cr is typ- 
ical velocity dispersion), the last integral depends relatively 
weakly on T for lQ-l^^'^fi^''^rTi/G < T < lO"/ V'^^rE/cr. 
Since the components of the information matrix (M) can be 

viewed as scalar products of the derivatives I ^JtP{T) J 

(where InT is used as the measure), larger distorsions of 
yjTP{T) resulting from small changes in the underlying pa- 
rameters will lead to smaller errors of parameter determina- 
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tion. This is illustrated in Fig. 7 where we show the curves 
[rP(r)]^/^ for /3 in the vicinity of 2 and mass function slope 
near a = —3/2 (upper and lower left) and a = —1/2 (upper 
and lower right). The dotted (A/3 = 0) and dashed {Aa = 0) 
curves show the distorsions resulting from shifts in the pa- 
rameters as indicated in the figure. Although l3 is not large, 
and accordingly, the 'plateau' is not very broad, one can 
clearly recognise the effects of a being equal to —3/2: it is 
due to the relatively large spread in In T (plateau) that the 
curve is so sensitive to small parameter changes. In addition, 
the consequences of the shifts in a and (3 can be clearly dis- 
tinguished. Thus for Aq — 0.3, the mode of the c urve shi fts 
towards longer times; the rough symmetry of yjTP{T) at 
the plateau is broken and now longer events dominate. By 
contrast, the change in 13 only supresses (A/3 > 0) or sharp- 
ens (A/3 < 0) the curve without shifting its mode. On the 
other hand, for a = —1/2 a positive shift in a is closer to 
producing effect similar to that of a negative A/3. This de- 
generacy gives rise to a small value of the determinant of 
the information matrix I^v which enhances the errors in a 

and /3. 

The flatness of ^JTP{T) in the vicinity of a = -3/2 
corresponds to roughly equal contribution of the M HO's 
with m asses n> ii p and the ones with fi < fio [see also ( Mao 
& Paczi'nski 1996)] to the total event rate. As we move to- 
ward larger a, the rate is dominated by larger masses (or, 
if we reduce a, by smaller masses). This simple fact helps 
explain an interesting feature of the bottom panel of Fig. 6: 
while at Q = —3/2 the error in (5 is slightly lower for /3 = 2 
than for /3 = 1 (see argument above: the plateau is broader 
for f3 — 2), this error increases more rapidly for (3 — 2 as 
we shift away from a = —3/2. This is a consequence of the 
'tail' (in terms of contribution to the event rate) of the mass 
function being less accessible [i.e. more distant from the (ef- 
fective) concentration of masses at the opposite end] in the 
case of the larger j3. 
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Figure 8. The differential event rate (upper panel) for fLo = 0.4, 
do = —2, l3o = 2 and the 'real' halo given by the 'CS' model 
(dashed line) and its closest SIS match {p. = 0.630, a = —1.874, 
(3 = 2.035) [solid line] obtained by maximising ( ^^ . In the lower 
panel, /to = 0.4, «o = 1, /3o = 2, while the SIS-based best fit is 
fl = 0.624, a = -0.188 and, /3 = 1.415. The two curves in the 
latter case are virtually indistinguishable. 



5 EFFECTS OF THE UNCERTAINTY OF 
MHOS' SPATIAL DISTRIBUTION AND 
KINEMATICS 

In this section we will study the extent to which the infer- 
ence of the MHO mass function will be hampered by the 
unavoidable uncertainty of the massive objects' halo struc- 
ture, i.e. their distribution and kinematics. For simplicity, 
we will consider only spherically symmetric halos. 

At first, we will estimate how much the inferred pa- 
rameters of the mass function could be offset systematically 
relative to the real values if instead of fitting the event du- 
ration distribution function Po{T) based on the 'true' halo, 
we use the distribution function P{T) based on a 'false' halo 
model. As we explain in the Appendix, a reasonable estimate 
of this shift in the parameters can be obtained by finding the 
mass function parameters c for which the expression 

*(c|co) = y Po(T|co)lnP(T|c)dr (25) 

is maximised (co denotes the 'real' mass function parame- 
ters). 

In Fig. 8 we show distributions PoiT) based on the 
'concentrated sphere' (CS) halo model along with their best 



matches from among the distributions P{T) based on the 
SIS model for parameters indicated in the figure caption. 
[Here as throughout this section we use £{T) — const.] While 
the two curves in the upper panel are close to each other, the 
ones in the lower panel are virtually indistinguishable. This 
is a typical situation for a broad range of the ('real') mass 
function parameters: by shifting the mass function parame- 
ters we can mask to a great extent the effects of changes in 
the halo model. Conversely, as long as we lack independent 
information about the MHOs' distribution and kinematics, 
there always will be significant errors in the determination 
of the mass function from event durations only. The rela- 
tively less accurate match of the two curves in the upper 
panel is characteristic of the values of a that allow an accu- 
rate inference of the mass function according to the results 
of the last section. We will discuss below in this section how 
we can take advantage of this residual difference to obtain 
a more accurate measure of the mass function. At present 
we will assume that we are indeed in error regarding the 
MHO distribution and kinematics and are thus using a false 
(SIS) halo model instead of the right (CS) one. What are 
the magnitudes of the systematic errors that ensue? 

In Fig. 9 we can observe the dependence of the system- 
atic shift, described above, on the value of Oo for /3o = 1 
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Figure 9. Systematic shift (bias) in the inferred parameters rel- 
ative to their 'real' values as a function of the real Oq. It is as- 
sumed that the real halo is of the 'CS' type, while the singular 
isothermal sphere (SIS) is used in the inference of p,, a and /3. 
For the dotted lines /3o = 2, while for the solid lines /3o = 1. In 
all cases {p.)o = 0.4. The curves were obtained by maximisation 
of expression (^). The stars indicate the results of Monte- Carlo 
simulations. The vertical segments bordered by stars in the plot 
of il give the la error for the Monte-Carlo simulations. 



(solid curve) and /3o = 2 (dotted curve). These results were 
obtained by maximising expression (ESI) for £(T) — const. 

We first notice the upward shift in the inferred av- 
erage mass p. by about 60%. The inferred mass scales as 
fj, ~ a'^/{x{l — x)), where a is the typical velocity disper- 
sion in the halo and {x{l — x)) denotes the average of the 
quantity over the differential event rate. By using the SIS 
instead of the 'right' CS halo model, we gain a factor of 
2 — 3 due to the increased a (see Fig. 1), while the mass es- 
timate is reduced due to the larger {a;(l — a;)) (the SIS halo is 
more extended). The net result is a shift in fi that depends 
only weakly on ao and fio- [The segments bordered by stars 
show results of Monte-Carlo simulations (10000 events) up 
to their statistical errors obtained from 400 simulations. The 



agreement with the results obtained by minimising (^5|) is 
good but not perfect. See Appendix for a brief discussion of 
the limitations of the method based on equation (p5[).] 

We also notice that for —2 ^ Oo ^ both a and /9 are 
relatively weakly afi'ected by the uncertainties of the halo 
model. This may be expected on the basis of our discussion 
(see the last Section) of the 'plateau' in P{T) for values of 
Qo near —1.5. The presence of the 'plateau' is independent 
of the halo model; it is a robust property carried over from 
Po(T) to P{T). (Our results for various values of the 'true' 
halo parameters, 7, CTo, <j+, To and I, indicate that a and /3 
are generally insensitive to the halo model uncertainties in 
the above range of a.) 

By contrast, for Oo ^0.5 the shift is significant. In this 
range of Qo, ^'(c, Co) as a function of a and (3 is very flat 
and insensitive to shifts in the two parameters. Due to this 
circumstance, even a slight 'tilt' to the function 'I'(c, Co) due 
to a switch to another halo model is likely to lead to a large 
shift in the position of the function's maximum in the (q, /3) 
plane. This is closely related to the basic reasons for large 
statistical errors for ao ^ 0.5 that we discussed in the last 
section. From the point of view of the inference of a and /3, 
the bias due to the uncertainty of the halo model would in 
this case only compound already very large statistical errors. 
The same holds for large negative values of a (< —2.5). 

In addition to the effect on the mass function determi- 
nation, MHO distribution and kinematics uncertainties will 
bear on the determination of the density and total mass in 
the massive objects' halo. If e = const, the two integrations 
in the total rate (E2p can be performed in the reversed order 



dT T 



2(q + 1) 



0/3/2 



10-/3/2 



dy y"F{y) 



J\/i^o/y 10- f/'^ 



1 f° 
= ^M^'/'C^Ma + 1/2) 



F{y)y ^'^dy 



and the total rate takes on the form 



V = Kpo--S.{a,l3) 



F{y)y-^'^dy 



(26) 



(27) 



where K \s a. constant independent of either the halo model 
or the MHO mass function and 
C^(a + 1/2) 



(28) 



v/C^(a)CM« + 1) 

Obviously [see equation (^^], F{y) depends only on the halo 
model but not on the parameters of the mass function. It also 
turns out that the equality H(a, /3)/H(ao, /3o) = 1 [a{ao,Po) 
and /3(qo, /3o) are again the best fits obtained with the 'false' 
halo model] is satisfied with better than 1% accuracy for the 
range of a and (5 discussed in this paper. Since the inferred 
fL, shown in Fig. 9, depends only very weakly on a and 13 it 
follows that the ratio (/5o)biascd/(Po)rcai is practically inde- 
pendent of a and /3. 

In the case discussed in this section so far, where the 
CS halo is chosen as the 'real' one and the SIS as the halo 
model used in the inference of the mass function, we ob- 
tain (po)biascd/(/9o)rcai — 0.5. Although the inferred value 
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Figure 10. The acceptable fraction (bottom) of best fit curves 
P(T) based on the 'real' CS halo (solid line) and 'false' SIS halo 
(dottes line) as functions of Kolmogorov-Smirnov confidence lev- 
els for N = 100, N = 1000 and N = 10000 detected events. The 
'real' mass function is ;U = 0.4, a = —2 and /3 = 2. In the upper 
panels the solid lines give the 'probability' that if a. CS-based best 
fit is accepted at the given confidence level, the SIS-based best 
fit will also pass the Kolmogorov-Smirnov test. The dotted lines 
show the probability of acceptance of a CS-based fit given that 
the SIS-based fit is acceptable. 



of the local MHO halo density Po is only half the actual 

one, the much slower fall-off of the SIS density at larger 
distances leads to the inferred total halo mass, Mtot = 
Att /^''"° po{r/RQ)-^r^dr, between the Sun and the LMC 
being about twice the actual amount. 

Since both the isothermal sphere and the 'concentrated' 
sphere can be viewed as plausible models of the massive ob- 
jects' halo, the parameter shifts evaluated so fax in this sec- 
tion provide an estimate of the magnitude of error due to 
halo structure uncertainty. As wo have stated above, due to 
the difficulty of distinguishing among different halo struc- 
tures on the basis of event duration measurements only, it 
would appear unlikely that these (systematic) errors should 
be possible to eliminate. 

On the other hand, we have noticed in the early part 
of this section that for a in the range most conducive to 
mass function parameter inference (a ~ —1-5), there is a 
residual difference between the differential event rate based 
on one halo model and its best match based on a different 
halo structure (upper panel of Fig. 8). Can this difference 
be exploited in a realistic statistical inference in order to 
reduce the effect of the halo structure uncertainty? Specifi- 
cally, with how many detected events do we need to sample 
the differential rate so that the small difference can be recog- 
nised? 

In order to answer this question we perform a number 
(400) of Monte-Carlo simulations of the mass function pa- 
rameters' statistical inference. Again, we assume that the 
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Figure 11. '1<t' confidence intervals (obtained by simulations 
where all eight parameters are varied in the maximum likelihood 
method) as functions of the number of detected events N. The 
dotted lines give the values of the 'real' parameters. 



'real' halo is CS, po = 0.4, a = —2, and = 2, generate a 
given number of events (AT = 100, 1000 or 10000) and ob- 
tain maximum likelihood fits based in turns on the CS and 
the ('false') SIS halo. In Fig. 10 we show the results of the 
Kolmogorov-Smirnov test of acceptability of the maximum 
likelihood fits (beised on the two different halos) as models 
for the observed data. 

At A = 100 it makes virtually no difference which halo 
model is used as a basis for statistical inference; the small 
discrepancy between the two event rate curves observed in 
Fig. 8 is completely swamped by the statistical noise. On 
the other hand, for N = 1000 the difference starts to be 
significant, although the probability of confusion between 
the two halos in any single data realisation is still high. At 
N = 10000 the ambiguity is completely resolved. Obviously, 
it is at about N = 1000 that we may first start discerning the 
halo structure effects and thus hope to be able to reduce the 
(halo model-induced) errors in the mass function parameters 
estimated earlier in this section. 

This conclusion is borne out by Monte-Carlo simula- 
tions where all eight parameters (7, CTo, (t+, fo, I, Ji, a and 
j3) arc varied with a uniform prior to obtain maximum- 
likelihood fits to events distributed according to the differen- 
tial event rate whose parameters are indicated by the dotted 
lines in Fig. 11. The vertical bars show 'Icr' intervals for in- 
ferred values of the parameters. [The inferred po and Mtot 
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are simply read out from the event rate ( p^ given a fixed 
event number of events and observation time witfiout allow- 
ing for the inevitable Poisson count noise.] 

As we concluded earlier in this section, for Oo — —2 the 
effects of the halo uncertainty on a and f3 are rather weak 
and these parameters are obtained with a good accuracy 
from between 100 and 1000 events. The halo model uncer- 
tainty adds little to the errors estimated in the last section. 
On the other hand, the error in fl falls bellow the level esti- 
mated above in this section only at A'^ ~ 1000. Notice that 
at A'^ = 10000 the error is larger by a factor of about 10 than 
the value obtained with a fixed halo model (see Fig. 5). 

The improvement at A'^ ~ 1000 is not necessarily re- 
flected in an accurate knowledge of the structure of the halo: 
for instance, Vo and I, and thus the velocity profile, are not 
accurately determined even for the largest A'^'s shown in Fig. 
11. It is rather that for A' near 1000 the halo is constrained 
just enough to allow a more reliable mass function inference; 
the small gap between the two curves of the upper panel of 
Fig. 8 can thus be much reduced with a rather broad range 
of best-fit halo models. 

Probably the most important quantities characterising 
the halo, its local density po and total mass between the 
Sun and LMC, Mtot (themselves functions of the above eight 
parameters and the observed event rate) clearly gain in ac- 
curacy with increasing A'' (see the two top panels of Fig. 
11). However, it should be pointed out that only at around 
TV — 10000 is the error due to halo structure uncertainty 
reduced to roughly the level of the simple Poisson count 
fluctuation l/\/6 ~ 40% associated with the total number 
of (reliable) LMC events detected by the time of this writing. 
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Appendix: Bias, error and the maximum likelihood 
method 

In this appendix we summarise some known mathematical 
results pertinent to the problem of estimating parameters of 
a pro babilitv distri bution, given a set of measurements [see 
also dCramer 1946| )]. 

Suppose we have a set of A' measurement results x = 
{xi,X2, a;jv}. Suppose also that the measurements are 
distributed according to the probability distribution f(x\c) 
[J fdx — 1), where c denotes a set of p parameters which are 
a priori unknown. Estimator c(x) is then a function of the 
measurement results that can serve as a reasonable estimate 
of the values of the underlying parameters c. 

Any statistical inference is, in principle, subject to error 
and bias in the inferred parameters. For each parameter 
the variance of estimation is defined as 



ml) 



(Al) 



where E denotes the expectation value of an ensemble whose 
each member consists of A'' measurements under the same 
conditions; c,i are the values of parameters estimated from 
the n measurements in a member of the ensemble. On the 
other hand, the bias 



(A2) 



is the systematic departure of estimated parameters from 
the 'true' parameters c. 

Given the dependence of bias on the the underlying pa- 
rameters, the errors of estimation are bounded from below 
by the Cramer (also known as Frechet-Cramer-Rao) limit, 
as we derive in the following. 

Denoting J'(x lc) = /(a:;i|c)/(a;2|c) ■ ■ • /(a::jvjc), the def- 
inition of bias ( A2 ) can be rewritten as 



c^(x)F(x|c)d"x ^ c^ + bf_, 



(A3) 



Differentiating the above equation with respect to parameter 
we obtain 

E{c^l,,) = E [(c^ - E{c^)) I,,] = 5^, + 6^,,., (A4) 
where we have used 

/(x,c) EE InF(xjc), (A5) 
and the fact that 



Eil.^.) = N / [lnf{x\c)l^fix\c)dx = 0, 



(A6) 
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due to the normalisation of /. Convolving the matrix equa- 
tion (A4) with arbitrary p-dimensional vectors and Vi, 
and using the Schwartz inequality, we obtain 



(A7) 



where we have used Einstein's convention of summation over 
repeated indices and defined the variance matrix 



and the information matrix 



(A8) 



r{") 



= N j ilnf{x\c))^^{lnf(x\c))^^fix\c)dx. (A9) 

It is convenient at this point to simplify notation by in- 
troducing Dirac-style bra/ket notation. Thus, \u > denotes 
the column Ug wh ile < u\ denotes the corresponding row. 
The equation (A7) can now be rewritten as 

iT,i ^ {< u\v > + < u\B\v >f 
<u\V\u> > ' — '-^ '—, (AlO) 

~ < V \I\v > 

where i?^^ = Since |« > is arbitrary, we can find the 

maximum of the right-hand side in the above equation (for 
a given \u >) if we require that the first derivative with 
respect to |t; > be zero. This condition leads to 



<ttl(l + B) 1- 



\v >< v\ 
< v\I\v > 



0. 



(All) 



It is s traig htforward to see that the vector satisfying condi- 
tion (|A11^ is 



(A12) 



\v> = r'' {i + b'^)\u>, 

which by substitution in equation (AlO) gives the Cramer 
inequality 



<u\V\u> > <u\{l + B)I'^ (i + B'^)\u>, 



(A13) 



where | it > is an arbitrary p-dimensional vector. 

In general, the Cramer inequality is only of limited util- 
ity: the actual estimator we are using may have variance that 
far exceeds the Cramer limit. In addition, it may be diffi- 
cult to determine the bias, which is needed to compute the 
Cramer limit in the first place. However, as we show below, 
the method of maximum likelihood is asymptotically (for A'' 
large enough) unbiased and approaches the Cramer limit. 

In the maximum likelihood method one adopts as esti- 
mates c of the parameters determining distribution /(a;|c) 
those values of the parameters that yield the maximum of 
Z(xjc) [ see equation (AE)] given a set x of A'' measurements. 
Expanding the maximum likelihood condition in the vicin- 
ity of the true values of the parameters c, = I, pic) = 
^,m(c) + (ci/ — Cv)l, fj,v{c) we obtain the 'shift' from the true 
parameters 

Cp-Cp^-[if,^r^l,^, (A14) 

where denotes the inverse matrix of [/,^^]. For n 

large enough, can be approximated by its ensemble 

average 

[I,,.]-' ^ E{[1,,^]-'} ^ [E {I,,,.)]-' = [C\c)] , (A15) 

where we have substituted Eil^^v) — —Eil^^l.^), which is a 
straightforward consequence of the normalisation of /. Due 



to equati on (|A6| ) the bias asymptotically vanishes, while the 
|A^) approaches the Cramer limit 

-'^{['.Mp] ^ ['.I'D-] ^ Iplcr^ 



variance 



[4^'(c)]-^[e'(c)]- 



E \lpla 



(A16) 



where, in the last equality we used equation (A9). We de- 
note the corresponding 'Cramer' deviation limit as AcC^ = 

In the above otline of the maximum likelihood method 
we assumed that all uncertainty regarding the underlying 
distribution function /(x|c) stems from the uncertainty of 
the values of a well defined, finite set of parameters, while the 
functional dependence /(x|c) is known. In the more general 
case, the functional dependence is not known accurately. We 
can then ask the following question: if we happen to be using 
a 'wrong' functional dependence /(x|c) = /o(x|c) -f(5/(xjc) 
instead of the correct function /o(x|c), what will be the bias 
(A2), i.e., the average departure from the true parameters. 
Obviously, in this case the bias need not be asymptotically 
zero. 

More specifically, the maximum likelihood method looks 
for the maximum of /(a;|c) = E^q ln/(a;i|c), where x are 
results of measurements distributed according to the true 
function fo, i.e., it seeks the values c for which /(a;|c),^ = 0. 
A reasonable approach to estimating the average value of c 
that this method would give for large n, may be to look for 
those values of c that maximize 



(A17) 



*!'(£, c) = j fo(x\c)\nf{x\c)dx. 

In other words, the requirement is 
a*(c,c)/9c^ =0. 



(A18) 



Note that if / = fo, this condition simply implies c = c. 

Unfortunately, this prescription finds the maximum of 
the average of i(x|c) which quite obviously need not be the 
same as the required average of those c that max;imize Z(x|c) 
for different measurement sets x. Still, for small enough 5f 
the two quantities are very close to each other. Indeed, in 
this case 



l(x\c) 



Si=o ^'n[fo{xi\c) + Sf{xi\c)] 
S£o {\nfo + 5f/U). 



From equation (A14) we then obtain 



hp = E{cp - Cp) « [Ip^r^ E fj-Sf, - ^-Ifo,^ 



(A19) 



(A20) 



where we have kept only terms linear in 5/ and used the 
fact that E{lo) = 0. On the other hand, the condition 
d'^{c,c)/dcp = gives 







d 

fodx {x\c) — 



ln/,(x|c) + ^ 

Jo 



fodx 



(ln/o(a;|c)),,i -|- (c^ - Ci,)(ln/o(3::|c)),^„ 



Jo Jo 



(A21) 



which in the limit of large n leads to equation (A2C) 
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Although, strictly speaking, condition (A18) holds only 
for small 5f, its cautious application can yield a reasonable 
estimate of bias even away from this strict limit. 



